Rationale

We assume that a binary function \(Y(t)\) is generated by a latent Gaussian function \(\eta(t)\), as follows:

\[Y_i(t) \sim Binomial(\frac{exp(\eta_i(t))}{1+exp(\eta_i(t))})\] \[\eta_i(t) = \beta_0(t)+b_i(t)\]

And this latent funciton can be approximated based on Karhunen–Loève expansion:

\[\eta_i(t) = f_0(t)+\sum_{k=1}^{K}\xi_k\phi_k(t)+\epsilon_i(t)\]

Toy simulation

Simulation set-up

  • For a random sample i:

\[\eta_i(t) =f_0(t)+ \xi_{i1}sin(2\pi t)+\xi_{i2}cos(2\pi t)+\xi_{i3}sin(4\pi t)+\xi_{i4}cos(4\pi t)\]

where \(\boldsymbol{\xi}_{i} \sim N(0, \boldsymbol{\Gamma})\).

\[Y_i(t) \sim Binomial(\frac{exp(\eta_i(t))}{1+exp(\eta_i(t))})\]

In this simulation, we set \(f_0(t) = 0\), \(\boldsymbol{\Gamma}\) is a diagonal matrix with diagonal elements {1, 0.5, 0.25, 0.125}.

Question 1: in FPCA framework, \(\boldsymbol{\Gamma}\) is a diagonal matrix of eigenvalues. However in my inplemetation, the eigenvalues are much larger than true \(\boldsymbol{\Gamma}\). I am not sure why this happened. Does it has to do with the GLMM step, or decreased density when binning observations?

Local GLMMs

The entire time grid is binned into 100 small intervals with equal length. The cut is made on the order index of measurement time, but not the actually time itself. Each bin its labeled by its midpoint (5, 15, 25, …, 995).

Within each interval labeled as \(s\), we fit a local GLMM model:

\[logit(E(Y_i^s)) = \beta_0^s+b_i^s\] Then we use this series of models to estimate the latent Gaussian function. Each subject will have a different value of latent function at each bin.

Question 2: When fitting local GLMMs, it seems that the intervals are essentially independent. We do not consider the effect of neighborhood intervals on a specific interval. This means that correlation between repeated measures from the same subject is only introduced by the latent function?

Figure below is the “mean” latent function across subject, which essential is \(\beta_0^s\) from the series of mixed models.

Figure below shows the estimated latent function for subjects 1-4:

FPCA

Here we fit FPCA on the estimated latent function from the previous step:

Follow-up on question 1: It looks like the PC function ranges are different from true PC functions. Maybe some of its variation is absorbed by the score, causing \(\boldsymbol{\Gamma}\) to be much greater than its true value?

In-sample prediction

Now let’s try to predict future outcomes based on observations \(\boldsymbol{Y}_m\) up to \(t_m\).

For in-sample prediction, we have observed the full outcome track, thus have estimated the full latent Gaussian function \(\eta(s)\) on binned grid. For now, we will take the quantities up to \(t_m\) and re-estimate the scores with empirical Bayes:

\[\hat{\boldsymbol{\xi}} = E(\boldsymbol{\xi}|\boldsymbol{Y}_m) = \boldsymbol{\hat{\Gamma}\Phi ^T_m}(\boldsymbol{\Phi_m\hat{\Gamma}\Phi^T_m}+\hat{\sigma_{\epsilon}}^2\boldsymbol{I_m})^{-1}(\hat{\boldsymbol{\eta}}_m-\hat{\boldsymbol{f}}_0^m)\] Where:

  • \(\boldsymbol{\hat{\Gamma}}\) is the diagonal covariance matrix of eigenvalues
  • \(\boldsymbol{\Phi}_m\) is the matrix of eigenfunctions up to \(t_m\). These should be constant across all subjects.
  • \(\hat{\sigma_{\epsilon}}^2\) is the residual variance from FPCA
  • \(\hat{\boldsymbol{\eta}}_m\) is the estimated latent Gaussian function up to \(t_m\)
  • \(\hat{\boldsymbol{f}}_0^m\) is the FPCA mean function up to \(t_m\)

And then, we use the FPCA model for point prediction on the entire binned grid:

\[\boldsymbol{\hat{\eta}} = \hat{\boldsymbol{f}}_0+\boldsymbol{\Phi}\boldsymbol{\hat{\xi}}\]

Here, \(\boldsymbol{\Phi}\) is the eigenfunctions on the entire binned grid. For \(\hat{\boldsymbol{\eta}}\), even though it is calculated for the entire binned grid, we will take values after \(t_m\).

Let

\[\boldsymbol{H} = \boldsymbol{\hat{\Gamma}\Phi ^T_m}(\boldsymbol{\Phi_m\hat{\Gamma}\Phi^T_m}+\hat{\sigma_{\epsilon}}^2\boldsymbol{I_m})^{-1}\]

The variance of \(\hat{\boldsymbol{\eta}}\) can be estimated by:

\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=\boldsymbol{\Phi}(\hat{\boldsymbol{\Gamma}}-\boldsymbol{H}\boldsymbol{\Phi_m}\hat{\boldsymbol{\Gamma}})\boldsymbol{\Phi}^T+\hat{\sigma_{\epsilon}}^2\]

It should be a \(S \times S\) matrix; diagonals are variance at a specific time and off-diagonals are covariance between different times on binned grid.

Below we try to do that with a few subjects:

Out of sample prediction

Now let’s say if we have a new subject \(\boldsymbol{Y}\) with observations up to \(t_m\) and \(t_m\) is in mth bin. It seems there are more than one ways to predict the following track. The first thing coming to mind was in fact to 1) refit local GLMMs to estimate a incomplete latent function; 2) refit local FPCA with this additional incomplete latent function track; and 3) repeat score estimation from in-sample procedure.

However, if we wish to make prediction without re-estimating any model (neither local GLMMs nor FPCA), we need to use the fitted FPCA model (specifically, \(\boldsymbol{\hat{f}}_0\), \(\boldsymbol{\Phi}\), \(\boldsymbol{\hat{\Gamma}}\), \(\hat{\sigma}_{\epsilon}\)) to estimate the scores for this new observations (out-of-sample scores). I can think of two ways of doing this:

  1. Empirical Bayes: similar to in-sample procedure, estimate the posterior mean
  2. MLE, as often used in longitudinal models.

Below I experimented with both procedures.

Empirical Bayes

Let say on the binned grid, at each interval s, new sample has \(n_s\) observations, among which \(h_s\) are successes. \(h_s = \sum_{j=1}^{n_S} I(Y_{sj}=1)\), j is the index for observation in a specific interval.

Now let’s find the empirical bayes estimator of score \(E(\boldsymbol{\xi}|\boldsymbol{Y})\), which is not likely to have closed form solution.

\[E(\boldsymbol{\xi}|\boldsymbol{Y})=\int \boldsymbol{\xi} p(\boldsymbol{\xi}|\boldsymbol{Y})d\boldsymbol{\xi}\] \[p(\boldsymbol{\xi}|\boldsymbol{Y}) = \frac{p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})}{\int p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}\]

Thus:

\[E(\boldsymbol{\xi}|\boldsymbol{Y})=\frac{\int \boldsymbol{\xi}p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}{\int p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}\]

We know that FPCA scores follow a multivariate normal distribution:

\[p(\boldsymbol{\xi}) = (2\pi)^{-K/2}det(\boldsymbol{\Gamma})^{-1/2}exp(-\boldsymbol{\xi^T \Gamma^{-1} \xi}/2)\]

And the binary outcome, conditional on score, follows a Binomial distribution:

\[\begin{aligned} p(\boldsymbol{Y}|\boldsymbol{\xi}) &= \prod_{s=1}^{m}p(\boldsymbol{Y}_s|\boldsymbol{\xi}) \\ & = \prod_{s=1}^{m} \binom{h_s}{n_s}p_s^{h_s}(1-p_s)^{n_s-h_s}\\ logit(p_s) & = \hat{\eta}_s = \hat{f}_{0}(s)+\phi(s)\boldsymbol{\xi} \end{aligned}\]

Follow up on question 2: Please note that here, we are assuming \(\boldsymbol{Y}_s\) is independent from each other given fPCA score. That is, all correlation across time within the same subject is introduced by underlying eigenfunctions.

It is very difficult to get a closed-form solution of this formula. Therefore, I tried to approximate the integrals with numeric integration (adaptIntegrate function from cubature package). In this case, I ran into the problem of computation time. Numeric integration was very, very slow, taking roughly an hour for just the denominator of one subject. The function value is also very small.

MLE

We maximize the same conditional likelihood of scores (conditional on observations):

\[p(\boldsymbol{\xi}|\boldsymbol{Y}) = \frac{p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})}{\int p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi})d\boldsymbol{\xi}}\]

But now we don’t need to calculated the marginal distribution of \(\boldsymbol{Y}\) anymore (denominator), since it would be constant wrt \(\boldsymbol{\xi}\):

\[\begin{aligned} p(\boldsymbol{\xi}|\boldsymbol{Y}) & \propto p(\boldsymbol{Y}|\boldsymbol{\xi})p(\boldsymbol{\xi}) \\ l(\boldsymbol{\xi}|\boldsymbol{Y}) & \propto logp (\boldsymbol{Y}|\boldsymbol{\xi})+log p(\boldsymbol{\xi}) \end{aligned}\]

Plug in the conditional log-likelihood of \(\boldsymbol{Y}\) and marginal log-likelihood of \(\boldsymbol{\xi}\):

\[\begin{aligned} l(\boldsymbol{\xi}|\boldsymbol{Y}) & \propto \sum_{s=1}^mh_s\eta_s-\sum_{s=1}^mn_slog(1+exp(\eta_s))-\boldsymbol{\xi^T\Gamma^{-1}\xi}/2\\ \frac{dl(\boldsymbol{\xi}|\boldsymbol{Y})}{d\boldsymbol{\xi}}& =\sum_{s=1}^mh_s\phi(s)-\sum_{s=1}^mn_s\frac{exp(\eta_s)}{1+exp(\eta_s)}\phi(s)-\boldsymbol{\xi^T\Gamma^{-1}} = 0 \end{aligned}\]

It is not easy to get a closed-form solution for the score equation above. We can try to get a numeric solution instead (multiroot function from rootSolve package).

Laplace approximation

It seems that \(E(\boldsymbol{\xi}|\boldsymbol{Y})\) and \(p(\boldsymbol{\xi}|\boldsymbol{Y})\) can be connected through Laplace approximation. Specifically, posterior mean estimated from Laplace method is the same as posterior mode. In this sense, the two methods above are essentially equivalent.

Below I have experimented with LaplaceDemon::LaplaceApproximation: https://search.r-project.org/CRAN/refmans/LaplacesDemon/html/LaplaceApproximation.html

In-/Out-of-sample prediction results of latent function

Predictive performance measures

This section evaluates the performance of out-of-sample prediction of individual tracks. Out reference method is Generalized Linear Mixed Models using Adaptive Gaussian Quadrature (GLMMadaptive), with a fixed effect of time and subject-level random intercept and slope:

\[g(E(Y_i)) = \beta_0+\beta_1t+b_{i0}+b_{i1}t\] We use two metrics for evaluation:

  1. Integrated squared error.
  • Evaluation of the latent Gaussian function
  • For a single subject, only evaluated at unobserved bin midpoints

\[\frac{1}{N}\sum_{i=1}^N\sum_{s=m+1}^S(\hat{\eta}_{is}-\eta_{is})^2\]

  1. Average AUC
  • Evaluation of the binary outcome function
  • AUC for a single subject is calculated over the unobserved time points on the original un-binned grid
  • Predicted probability between bin midpoints are estimated with interpolation
  • Please note that the grid ends at the maximum bin midpoint, instead of the maximum observed time, due to how interpolation works
  1. Computation time

This is a bit more complicated. The full computation time consists of two parts:

  • Time spent on model fitting
  • Time spent on prediction given the fitted FPCA or adaptive GLMM models

Table below include time spent only for prediction.

ISE
AUC
Maximum observed time fGFPCA GLMMadaptive fGFPCA GLMMadaptive
195 71.328 156.269 0.686 0.538
395 24.033 101.512 0.709 0.499
595 3.543 81.784 0.699 0.423

Prediction interval

With an estimate of scores \(\hat{\boldsymbol{\xi}}\)

\[\boldsymbol{\hat{\eta}}=\boldsymbol{\hat{f_0}}+\boldsymbol{\Phi\hat{\xi}}\] \[\begin{aligned} \hat{\boldsymbol{\eta}}-\boldsymbol{\eta}&=\boldsymbol{\hat{f_0}}+\boldsymbol{\Phi\hat{\xi}}-\boldsymbol{\eta}\\ &= \boldsymbol{\hat{f_0}-f_0}+\boldsymbol{\Phi(\hat{\xi}-\xi)}-\boldsymbol{\epsilon} \end{aligned}\]

\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=Var(\boldsymbol{\hat{f_0}-f_0})+\boldsymbol{\Phi Var(\hat{\xi}-\xi) \Phi^T}+Var(\epsilon)\]

  • The three components should be mutually independent?
  • Through some derivation, I think \(Var(\boldsymbol{f_0})=\frac{Var(\boldsymbol{Y})}{N} = \frac{1}{N}[\boldsymbol{\Phi\Gamma \Phi^T}+\sigma_{\epsilon}^2\boldsymbol{I}_m]\).
  • Variance of \(\hat{\boldsymbol{\xi}}\) will depend on the method used for its estimation. But isn’t the estimated done conditioning on \(Var(\boldsymbol{\xi}) = \boldsymbol{\Gamma}\)?
  • \(Var(\epsilon) =\hat{\sigma}_{\epsilon}^2\) from FPCA
  • May need to circle back to in-sample estimation and dig into where that formula came from

Below are some potential methods:

  1. Estimate the variance with Empirical Bayes estimators of Gaussian process
  • Essentially the same formula as in-sample prediction interval

\[\boldsymbol{H} = \boldsymbol{\hat{\Gamma}\Phi ^T_m}(\boldsymbol{\Phi_m\hat{\Gamma}\Phi^T_m}+\hat{\sigma_{\epsilon}}^2\boldsymbol{I_m})^{-1}\]

\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=\boldsymbol{\Phi}(\hat{\boldsymbol{\Gamma}}-\boldsymbol{H}\boldsymbol{\Phi_m}\hat{\boldsymbol{\Gamma}})\boldsymbol{\Phi}^T+\hat{\sigma_{\epsilon}}^2\]

  • This estimate does not change across subject?
  • Also does not change from in- to out-of-sample?
  • Ignore randomness from FPCA estimates (that is, \(\boldsymbol{f}_0\), \(\boldsymbol{\Gamma}\) and \(\sigma_{\epsilon}\)).
  1. Fisher information matrix
  • Based on posterior likelihood
  • Also ignore randomness from FPCA estimates

\[Var(\hat{\boldsymbol{\eta}}-\boldsymbol{\eta})=\boldsymbol{\Phi}I(\hat{\boldsymbol{\xi}})\boldsymbol{\Phi}^T+\hat{\sigma_{\epsilon}}^2\boldsymbol{I}_m \] - Standard deviation would change across subjects.

In-/Out-of-sample PC scores

Here I explored the estimated scores, comparing them with the true scores from simulation setup. The figures actually didn’t look biased to me. Perhaps it is an artifact of small functional range. Need to think about that.

Next steps:

  • Interval prediction for MLE
  • Bias of PC scores
  • FPCA eigenfunctions: how to extend to the original grid? Interpolation or projection?